首页> 外文OA文献 >Large-scale analysis of Zipf's law in English texts
【2h】

Large-scale analysis of Zipf's law in English texts

机译:英语文本中Zipf定律的大规模分析

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。
获取外文期刊封面目录资料

摘要

Despite being a paradigm of quantitative linguistics, Zipf's law for wordssuffers from three main problems: its formulation is ambiguous, its validityhas not been tested rigorously from a statistical point of view, and it has notbeen confronted to a representatively large number of texts. So, we cansummarize the current support of Zipf's law in texts as anecdotic. We try to solve these issues by studying three different versions of Zipf'slaw and fitting them to all available English texts in the Project Gutenbergdatabase (consisting of more than 30000 texts). To do so we use state-of-theart tools in fitting and goodness-of-fit tests, carefully tailored to thepeculiarities of text statistics. Remarkably, one of the three versions ofZipf's law, consisting of a pure power-law form in the complementary cumulativedistribution function of word frequencies, is able to fit more than 40% of thetexts in the database (at the 0.05 significance level), for the whole domain offrequencies (from 1 to the maximum value) and with only one free parameter (theexponent).
机译:尽管是定量语言学的范式,但齐普夫的单词定律受到三个主要问题的困扰:它的表述模棱两可,其有效性尚未从统计学的角度进行严格的测试,并且尚未遇到大量具有代表性的文本。因此,我们可以将齐普夫定律在文本中的当前支持概括为趣闻。我们尝试通过研究Zipf定律的三个不同版本并将它们适合Project Gutenberg数据库中的所有可用英语文本(包含30000多个文本)来解决这些问题。为此,我们在拟合和拟合优度测试中使用了最先进的工具,这些工具针对文本统计的特殊性进行了精心设计。值得注意的是,Zipf定律的三个版本之一由单词频率的互补累积分布函数中的纯幂定律形式组成,能够满足数据库中40%以上的文本(在0.05的显着性水平)。整个频率域(从1到最大值),并且只有一个自由参数(指数)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号